The Role of Quasi-identifiers in k-Anonymity Revisited

نویسندگان

  • Claudio Bettini
  • Xiaoyang Sean Wang
  • Sushil Jajodia
چکیده

The concept of k-anonymity, used in the recent literature (e.g., [10, 11, 7, 5, 1]) to formally evaluate the privacy preservation of published tables, was introduced in the seminal papers of Samarati and Sweeney [10, 11] based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anonymization is usually rigorously validated by the authors, the definition of QI remains mostly informal, and different authors seem to have different interpretations of the concept of QI. The purpose of this short note is to provide a formal underpinning of QI and examine the correctness and incorrectness of various interpretations of QI in our formal framework. We observe that in cases where the concept has been used correctly, its application has been conservative; this note provides a formal understanding of the conservative nature in such cases. The notion of QI was perhaps first introduced by Dalenius in [3] to denote a set of attribute values in census records that may be used to re-identify a single or a group of individuals. To Dalenius, the case of multiple individuals being identified is potentially dangerous because of collusion. In [10, 11], the notion of QI is extended to a set of attributes whose (combined) values may be used to re-identify the individuals of the released information by using “external” sources. Hence, the appearance of QI attribute values in a published database

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anonymity: Formalisation of Privacy – k-anonymity

Microdata is the basis of statistical studies. If microdata is released, it can leak sensitive information about the participants, even if identifiers like name or social security number are removed. A proper anonymization for statistical microdata is essential. K-anonymity has been intensively discussed as a measure for anonymity in statistical data. Quasi identifiers are attributes that might...

متن کامل

Butterfly: Privacy Preserving Publishing on Multiple Quasi-Identifiers

Recently, privacy preserving data publishing has attracted significant interest in research. Most of the existing studies focus on only the situations where the data in question is published using one quasi-identifier. However, in a few important applications, a practical demand is to publish a data set on multiple quasi-identifiers for multiple users simultaneously, which poses several challen...

متن کامل

Privacy Issues for K-anonymity Model

K-anonymity is the approach used for preventing identity disclosure. Identity disclosure means an individual is linked to a particular record in the published data and individual’s sensitive data is accessed .Some important information such as Name, Income details , Medical Status and Property details are considered as a sensitive data( or Attribute) because these data have to be kept secure fr...

متن کامل

A Rough Set Based Efficient l-diversity Algorithm

Most of the organizations publish micro data for a variety of purposes including demographic and public health research. To protect the anonymity of the entities, data holders often remove or encrypt explicit identifiers. But, released information often contains quasi identifiers, which leak valuable information. Samarati and Sweeney introduced the concept of k-anonymity to handle this problem ...

متن کامل

kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity

k-anonymity is the method used for masking sensitive data which successfully solves the problem of re-linking of data with an externa l source and makes it difficul t to l'e-iden tify the individual. T hus kanonymity works on a set of quasi-identifiers (public sensitive at t ributes), whose possible availability and linking is anticipated from external dataset , and demands that the released da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0611035  شماره 

صفحات  -

تاریخ انتشار 2006